[SPARK-14127][SQL] Native "DESC [EXTENDED | FORMATTED] <table>" DDL command#12844
[SPARK-14127][SQL] Native "DESC [EXTENDED | FORMATTED] <table>" DDL command#12844liancheng wants to merge 6 commits intoapache:masterfrom
Conversation
|
Test build #57539 has finished for PR 12844 at commit
|
|
Test build #57597 has finished for PR 12844 at commit
|
803f28e to
0bc9f5a
Compare
|
Test build #57601 has finished for PR 12844 at commit
|
There was a problem hiding this comment.
A typo bug, not related to this PR.
There was a problem hiding this comment.
Oh wait... It's actually a typo from Hive... 😵 “Fixing" it fails existing test case.
- Shows partition columns for EXTENDED and FORMATTED - Shows "Compressed:" field - Shows data types in lower case
0bc9f5a to
9194fe1
Compare
|
Test build #57613 has finished for PR 12844 at commit
|
|
Test build #57621 has finished for PR 12844 at commit
|
|
Test build #57630 has finished for PR 12844 at commit
|
| inputFormat: Option[String], | ||
| outputFormat: Option[String], | ||
| serde: Option[String], | ||
| compressed: Boolean, |
There was a problem hiding this comment.
Is this ever true? If it isn't we could leave it out.
There was a problem hiding this comment.
Nvm. Hive can pass compressed tables.
|
LGTM |
|
Test build #57729 has finished for PR 12844 at commit
|
|
Thanks for the review! I'm merging this to master and branch-2.0. |
…ommand
## What changes were proposed in this pull request?
This PR implements native `DESC [EXTENDED | FORMATTED] <table>` DDL command. Sample output:
```
scala> spark.sql("desc extended src").show(100, truncate = false)
+----------------------------+---------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+---------------------------------+-------+
|key |int | |
|value |string | |
| | | |
|# Detailed Table Information|CatalogTable(`default`.`src`, ...| |
+----------------------------+---------------------------------+-------+
scala> spark.sql("desc formatted src").show(100, truncate = false)
+----------------------------+----------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------------------------------------+-------+
|key |int | |
|value |string | |
| | | |
|# Detailed Table Information| | |
|Database: |default | |
|Owner: |lian | |
|Create Time: |Mon Jan 04 17:06:00 CST 2016 | |
|Last Access Time: |Thu Jan 01 08:00:00 CST 1970 | |
|Location: |hdfs://localhost:9000/user/hive/warehouse_hive121/src | |
|Table Type: |MANAGED | |
|Table Parameters: | | |
| transient_lastDdlTime |1451898360 | |
| | | |
|# Storage Information | | |
|SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat: |org.apache.hadoop.mapred.TextInputFormat | |
|OutputFormat: |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| |
|Num Buckets: |-1 | |
|Bucket Columns: |[] | |
|Sort Columns: |[] | |
|Storage Desc Parameters: | | |
| serialization.format |1 | |
+----------------------------+----------------------------------------------------------+-------+
```
## How was this patch tested?
A test case is added to `HiveDDLSuite` to check command output.
Author: Cheng Lian <lian@databricks.com>
Closes #12844 from liancheng/spark-14127-desc-table.
(cherry picked from commit f152fae)
Signed-off-by: Cheng Lian <lian@databricks.com>
…data source tables ## What changes were proposed in this pull request? This is a follow-up of PR #12844. It makes the newly updated `DescribeTableCommand` to support data sources tables. ## How was this patch tested? A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output. Author: Cheng Lian <lian@databricks.com> Closes #12934 from liancheng/spark-14127-desc-table-follow-up. (cherry picked from commit 671b382) Signed-off-by: Yin Huai <yhuai@databricks.com>
…data source tables ## What changes were proposed in this pull request? This is a follow-up of PR #12844. It makes the newly updated `DescribeTableCommand` to support data sources tables. ## How was this patch tested? A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output. Author: Cheng Lian <lian@databricks.com> Closes #12934 from liancheng/spark-14127-desc-table-follow-up.
…able properties for data source tables ## What changes were proposed in this pull request? This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read. Sample output: ``` +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |a |bigint | | |b |bigint | | |c |bigint | | |d |bigint | | |# Partition Information | | | |# col_name | | | |d | | | | | | | |# Detailed Table Information| | | |Database: |default | | |Owner: |lian | | |Create Time: |Tue May 10 03:20:34 PDT 2016 | | |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 | | |Location: |file:/Users/lian/local/src/spark/workspace-a/target/... | | |Table Type: |MANAGED | | |Table Parameters: | | | | rawDataSize |-1 | | | numFiles |1 | | | transient_lastDdlTime |1462875634 | | | totalSize |684 | | | spark.sql.sources.provider|parquet | | | EXTERNAL |FALSE | | | COLUMN_STATS_ACCURATE |false | | | numRows |-1 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat: |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Compressed: |No | | |Num Buckets: |2 | | |Bucket Columns: |[b] | | |Sort Columns: |[c] | | |Storage Desc Parameters: | | | | path |file:/Users/lian/local/src/spark/workspace-a/target/... | | | serialization.format |1 | | +----------------------------+---------------------------------------------------------+-------+ ``` ## How was this patch tested? Test cases are added in `HiveDDLSuite` to check command output. Author: Cheng Lian <lian@databricks.com> Closes #13025 from liancheng/spark-14127-extract-schema-info. (cherry picked from commit 8a12580) Signed-off-by: Yin Huai <yhuai@databricks.com>
…able properties for data source tables ## What changes were proposed in this pull request? This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read. Sample output: ``` +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |a |bigint | | |b |bigint | | |c |bigint | | |d |bigint | | |# Partition Information | | | |# col_name | | | |d | | | | | | | |# Detailed Table Information| | | |Database: |default | | |Owner: |lian | | |Create Time: |Tue May 10 03:20:34 PDT 2016 | | |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 | | |Location: |file:/Users/lian/local/src/spark/workspace-a/target/... | | |Table Type: |MANAGED | | |Table Parameters: | | | | rawDataSize |-1 | | | numFiles |1 | | | transient_lastDdlTime |1462875634 | | | totalSize |684 | | | spark.sql.sources.provider|parquet | | | EXTERNAL |FALSE | | | COLUMN_STATS_ACCURATE |false | | | numRows |-1 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat: |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Compressed: |No | | |Num Buckets: |2 | | |Bucket Columns: |[b] | | |Sort Columns: |[c] | | |Storage Desc Parameters: | | | | path |file:/Users/lian/local/src/spark/workspace-a/target/... | | | serialization.format |1 | | +----------------------------+---------------------------------------------------------+-------+ ``` ## How was this patch tested? Test cases are added in `HiveDDLSuite` to check command output. Author: Cheng Lian <lian@databricks.com> Closes #13025 from liancheng/spark-14127-extract-schema-info.
| describe(relation, buffer) | ||
|
|
||
| append(buffer, "", "", "") | ||
| append(buffer, "# Detailed Table Information", relation.catalogTable.toString, "") |
There was a problem hiding this comment.
@liancheng To improve the output of Explain, I plan to change the default implementation of toString of case class CatalogTable. That will also affect the output of Describe Extended.
I checked what Hive did for the command Describe Extended, as follows.
Detailed Table Information Table(tableName:t1, dbName:default, owner:root, createTime:1462627092, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://6b68a24121f4:9000/user/hive/warehouse/t1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1462627092}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Basically, in the implementation of toString, I will try to follow what you did in describeFormatted but the contents will be in a single line. Feel free to let me know if you have any concern or suggestion. Thanks!
What changes were proposed in this pull request?
This PR implements native
DESC [EXTENDED | FORMATTED] <table>DDL command. Sample output:How was this patch tested?
A test case is added to
HiveDDLSuiteto check command output.